{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 使用Logistic Regression做CTR预估"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's say that you're a major search engine, and you need to decide which ad to display at the top of your search results. How would you do it?\n",
    "\n",
    "Your first thought might be to narrow the scope to ads \"related\" to the search, and then choose whichever ad offers the greatest revenue. Companies have already bid on how much they will pay you, so it seems easy to maximize your revenue by choosing the highest paying ad. But is that the right approach?\n",
    "\n",
    "Many ads are actually sold on a \"pay-per-click\" (PPC) basis, meaning the company only pays for ad clicks, not ad views. Thus your optimal approach (as a search engine) is actually to choose an ad based on \"expected value\", meaning the price of a click times the likelihood that the ad will be clicked. In other words, a \\$1.00 ad with a 5% probability of being clicked has an expected value of \\$0.05, whereas a \\$2.00 ad with a 1% probability of being clicked has an expected value of only \\$0.02. In this case, you would choose to display the first ad.\n",
    "\n",
    "In order for you to maximize expected value, you therefore need to accurately predict the likelihood that a given ad will be clicked, also known as \"click-through rate\" (CTR)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, I'll walk through the predictive modeling process, discuss why logistic regression is a good choice for this task, and then explain this code line-by-line so that you can apply it to your own predictive task!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For this example, I'm using the data from a [Kaggle competition](https://www.kaggle.com/c/avazu-ctr-prediction) on click-through rate prediction sponsored by Avazu. The goal in the competition matches our goal, which is to predict the likelihood that a given ad will be clicked."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Reading and Exploring the Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I've already downloaded the dataset from Kaggle for this example and extracted a small subset to make my calculations faster. If you would like to follow along, you should download and decompress `train.gz` from the [competition's data page](https://www.kaggle.com/c/avazu-ctr-prediction/data) (login required), and then extract the first 100,000 lines from `train.csv` using this command at the command line/terminal: `head -n100000 train.csv > train_subset.csv`\n",
    "\n",
    "Our first step is to read the data into an SFrame, which is GraphLab's tabular data structure that is similar to a data frame in R or a pandas DataFrame in Python.\n",
    "\n",
    "This data happens to be stored in the popular CSV (comma separated value) format, but SFrames can be constructed from a variety of [sources](https://turi.com/products/create/docs/graphlab.data_structures.html#connectors). We'll use the [read_csv](https://turi.com/products/create/docs/generated/graphlab.SFrame.read_csv.html) method to read in the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1539710428.log\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This non-commercial license of GraphLab Create for academic use is assigned to songtaoxy@outlook.com and will expire on October 16, 2019.\n"
     ]
    }
   ],
   "source": [
    "# import pandas as pd\n",
    "# data = pandas.read_csv('train_subset.csv', verbose=False)\n",
    "# data = pd.read_csv('/Users/songtao/personaldriveMac/ai_project/ai_csdn_20180917/ctr_prediction/datasets/train/train', verbose=False)\n",
    "import graphlab as gl\n",
    "# data = gl.SFrame.read_csv('../datasets/train/train', verbose=False)\n",
    "data = gl.SFrame.read_csv('/Users/songtao/personaldriveMac/ai_project/ai_csdn_20180917/ctr_prediction/datasets/train/train', verbose=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a quick look at the first row of data, to see what we're working with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">id</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">click</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">hour</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C1</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">banner_pos</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">site_id</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">site_domain</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">site_category</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">app_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1000009418151094273</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000169349117863715</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000371904215119486</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000640724480838376</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000679056417042096</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">fe8cc448</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">9166c161</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0569f928</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000720757801103869</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">d6137915</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">bb1ef334</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f028772b</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000724729988544911</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">8fda644b</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">25d4cfcd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f028772b</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000918755742328737</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">e151e245</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7e091613</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f028772b</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000949271186029916</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10001264480619467364</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1002</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">84c7ba46</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">c4e18dd6</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50e219e0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "</table>\n",
       "<table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">app_domain</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">app_category</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_id</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_ip</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_model</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_type</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_conn_type</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C14</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C15</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C16</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ddd2926e</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">44956a24</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15706</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">96809ac8</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">711ee120</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15704</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b3cf8def</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">8a4875bd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15704</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">e8275b8f</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">6332421a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15706</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">9644d0bf</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">779d90c2</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">18993</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">05241af0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">8a4875bd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">16920</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b264c159</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">be6db1d7</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">20362</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">e6f67278</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">be74e6fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">20632</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">37e8da74</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5db079b5</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15707</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">c357dbff</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f1ac7184</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">373ecbe6</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">21689</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "</table>\n",
       "<table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C17</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C18</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C19</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C20</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C21</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100084</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100084</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100084</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2161</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1899</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">431</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100077</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2333</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">39</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2374</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">39</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2496</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">167</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100191</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">23</td>\n",
       "    </tr>\n",
       "</table>\n",
       "[10 rows x 24 columns]<br/>\n",
       "</div>"
      ],
      "text/plain": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">id</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">click</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">hour</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C1</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">banner_pos</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">site_id</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">site_domain</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">site_category</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">app_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1000009418151094273</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000169349117863715</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000371904215119486</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000640724480838376</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000679056417042096</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">fe8cc448</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">9166c161</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0569f928</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000720757801103869</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">d6137915</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">bb1ef334</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f028772b</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000724729988544911</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">8fda644b</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">25d4cfcd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f028772b</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000918755742328737</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">e151e245</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7e091613</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f028772b</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10000949271186029916</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1fbe01fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f3845767</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">28905ebd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10001264480619467364</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">14102100</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1002</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">84c7ba46</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">c4e18dd6</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50e219e0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ecad2386</td>\n",
       "    </tr>\n",
       "</table>\n",
       "<table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">app_domain</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">app_category</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_id</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_ip</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_model</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_type</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_conn_type</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C14</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C15</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C16</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">ddd2926e</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">44956a24</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15706</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">96809ac8</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">711ee120</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15704</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b3cf8def</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">8a4875bd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15704</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">e8275b8f</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">6332421a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15706</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">9644d0bf</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">779d90c2</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">18993</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">05241af0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">8a4875bd</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">16920</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b264c159</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">be6db1d7</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">20362</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">e6f67278</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">be74e6fe</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">20632</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a99f214a</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">37e8da74</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5db079b5</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">15707</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7801e8d9</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">07d7df22</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">c357dbff</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">f1ac7184</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">373ecbe6</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">21689</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">320</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">50</td>\n",
       "    </tr>\n",
       "</table>\n",
       "<table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C17</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C18</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C19</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C20</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C21</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100084</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100084</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100084</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2161</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1899</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">431</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100077</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2333</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">39</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2374</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">39</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1722</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">35</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2496</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">167</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">100191</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">23</td>\n",
       "    </tr>\n",
       "</table>\n",
       "[10 rows x 24 columns]<br/>\n",
       "</div>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()\n",
    "\n",
    "# print(data.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From Kaggle's [data dictionary](https://www.kaggle.com/c/avazu-ctr-prediction/data), I know that click=0 means the ad was not clicked, and click=1 means the ad was clicked. The \"click\" column is therefore our target variable, and the other columns are our potential features!\n",
    "\n",
    "The first thing we want to know is what percentage of ads in the dataset were actually clicked. In this case, we can simply take the mean of the \"click\" column, since that is equivalent to adding up all of the ones (which is the number of clicks) and dividing by the total number of ads:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.16980562476404382"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['click'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that 17.5% of the ads were clicked, meaning the overall click-through rate is 17.5%. This is useful to keep in mind as a \"baseline\", as we'll see later on.\n",
    "\n",
    "Before we start building a machine learning model, it's always useful to explore the dataset. One way to get started is by using the GraphLab Canvas, a browser-based visualization platform:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "gl.canvas.set_target('ipynb')\n",
    "data.show()\n",
    "\n",
    "\n",
    "# print(data.describe())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I noticed that \"device_type\" only has 4 unique values, and it makes intuitive sense that the type of device you're using when viewing an ad might affect your likelihood of clicking the ad, so let's explore it further.\n",
    "\n",
    "To understand the relationship between this feature and the target variable, we want to calculate the click-through rate for each value of device_type. We can accomplish this by \"grouping the data\" by device_type, and then calculating the mean of the click column for each group:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_type</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">CTR</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0645161290323</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.210731480197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.169175776318</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.093842164338</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0954444949578</td>\n",
       "    </tr>\n",
       "</table>\n",
       "[5 rows x 2 columns]<br/>\n",
       "</div>"
      ],
      "text/plain": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">device_type</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">CTR</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0645161290323</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.210731480197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.169175776318</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.093842164338</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0954444949578</td>\n",
       "    </tr>\n",
       "</table>\n",
       "[5 rows x 2 columns]<br/>\n",
       "</div>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.groupby('device_type', {'CTR':gl.aggregate.MEAN('click')})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We saw earlier that the baseline click-through rate is 17.5%, and it appears that there is a big difference in average click-through rate depending on device_type. This looks like a good feature!\n",
    "\n",
    "Similarly, the C1 column looks like a good feature:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C1</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">CTR</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1002</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.210731480197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1007</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0394289598912</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.169330882684</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1001</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0333932156821</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1008</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.121651978573</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1012</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.172492776094</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1010</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0952153782637</td>\n",
       "    </tr>\n",
       "</table>\n",
       "[7 rows x 2 columns]<br/>\n",
       "</div>"
      ],
      "text/plain": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
       "    <tr>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">C1</th>\n",
       "        <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">CTR</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1002</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.210731480197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1007</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0394289598912</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1005</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.169330882684</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1001</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0333932156821</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1008</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.121651978573</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1012</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.172492776094</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1010</td>\n",
       "        <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.0952153782637</td>\n",
       "    </tr>\n",
       "</table>\n",
       "[7 rows x 2 columns]<br/>\n",
       "</div>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.groupby('C1', {'CTR':gl.aggregate.MEAN('click')})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I also noticed that C15 and C16 appear to be the dimensions of the ad (width and height), which we would also imagine are good predictors of whether an ad is clicked:\n",
    "\n",
    "统计频次\n",
    "将某些频次低的作为单独的一列来处理. 低频的作为一类, 否则做onehot编码, 维度会暴涨\n",
    "a:b-> 某个项:该项出现的次数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{120: 3069,\n 216: 298794,\n 300: 2337294,\n 320: 37708959,\n 480: 2137,\n 728: 74533,\n 768: 1621,\n 1024: 2560}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['C15'].sketch_summary().frequent_items()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{20: 3069,\n 36: 298794,\n 50: 38136554,\n 90: 74533,\n 250: 1806334,\n 320: 2137,\n 480: 103365,\n 768: 2560,\n 1024: 1621}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['C16'].sketch_summary().frequent_items()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For our initial model, we'll just use device_type, C1, C15, and C16 as our features.\n",
    "\n",
    "Note that when we built the SFrame from the CSV file, it simply guessed the data type of each column. Sometimes these data types need to be adjusted, so let's take a quick look at the column names and their associated types to see if there's anything we need to fix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('id', str),\n ('click', int),\n ('hour', int),\n ('C1', int),\n ('banner_pos', int),\n ('site_id', str),\n ('site_domain', str),\n ('site_category', str),\n ('app_id', str),\n ('app_domain', str),\n ('app_category', str),\n ('device_id', str),\n ('device_ip', str),\n ('device_model', str),\n ('device_type', int),\n ('device_conn_type', int),\n ('C14', int),\n ('C15', int),\n ('C16', int),\n ('C17', int),\n ('C18', int),\n ('C19', int),\n ('C20', int),\n ('C21', int)]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "zip(data.column_names(), data.column_types())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We know that both device_type and C1 are \"categorical variables\", meaning that their numerical values represent categories. We'll convert the data type of both of those columns from integer to string, because we don't want our machine learning model to think there is a mathematical relationship between the category values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "data['device_type'] = data['C1'].astype(str)\n",
    "data['C1'] = data['C1'].astype(str)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You could spend a lot more time on the exploratory phase, but let's move along to the next step in predictive modeling! If you want to learn how to manipulate SFrames in more depth, read through this example notebook, [Introduction to SFrames](https://turi.com/learn/gallery/notebooks/introduction_to_sframes.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Splitting the Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One of the keys to proper machine learning is model evaluation. The goal of model evaluation is to estimate how well your model will \"generalize\" to future data. In other words, we want to build a model that accurately predicts the future, not the past!\n",
    "\n",
    "One of the most common evaluation procedures is to split your data into a \"training set\" and a \"testing set\". \n",
    "\n",
    "Let's use an 80/20 split, in which 80% of the data is used for training and 20% is used for testing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "train_data, test_data = data.random_split(0.8, seed=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have two separate SFrames, called train_data and test_data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Selecting a Machine Learning Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are two main types of models: classification models, which are used when your target variable is categorical (such as yes/no), and regression models, which are used when your target variable is continuous (such as price). In this case, we'll need to use a classification model, since our target variable is categorical (click: yes or no).\n",
    "\n",
    "The specific model we're going to use in this case is logistic regression. In logistic regression, the probability that the target is True is modeled as a logistic function of a linear combination of the features. Thus, the model is predicting a probability (which is a continuous value), but that probability is used to choose the predicted target class. In other words, it's using regression to predict a continuous value, but we're using the continuous value that is output from the model to perform classification. (Pretty cool, right?)\n",
    "\n",
    "It can take a lot of study to truly understand a machine learning model, but a good introduction to logistic regression is available in the [user guide](https://turi.com/learn/userguide/supervised-learning/logistic-regression.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, why exactly did we choose logistic regression for this task, instead of any of the other [available classification models](https://turi.com/learn/userguide/supervised-learning/classifier.html)? Well, it turns out that logistic regression has many nice properties. For starters, it is a very fast model, meaning that it does not take long to train the model or make predictions. As well, it is highly interpretable, meaning that you can understand exactly how it's making predictions. But the key consideration in this case is that logistic regression outputs \"well-calibrated\" predicted probabilities. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Training a Machine Learning Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we've selected our model, we can finally start the model training process! In GraphLab Create, this can be done in a single line. You simply pass in the training data, the name of the target column, and the names of the feature columns. And in fact, if you just replace `gl.logistic_classifier.create` with `gl.classifier.create`, GraphLab will choose the best model for you automatically (based on the properties of your data), meaning that you can skip Step 3 above!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>SUCCESS: Optimal solution found.</pre>"
      ],
      "text/plain": [
       "<pre>SUCCESS: Optimal solution found.</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.\n          You can set ``validation_set=None`` to disable validation tracking.\n\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre></pre>"
      ],
      "text/plain": [
       "<pre></pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>Logistic regression:</pre>"
      ],
      "text/plain": [
       "<pre>Logistic regression:</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>--------------------------------------------------------</pre>"
      ],
      "text/plain": [
       "<pre>--------------------------------------------------------</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>Number of examples          : 30727675</pre>"
      ],
      "text/plain": [
       "<pre>Number of examples          : 30727675</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>Number of classes           : 2</pre>"
      ],
      "text/plain": [
       "<pre>Number of classes           : 2</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>Number of feature columns   : 4</pre>"
      ],
      "text/plain": [
       "<pre>Number of feature columns   : 4</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>Number of unpacked features : 4</pre>"
      ],
      "text/plain": [
       "<pre>Number of unpacked features : 4</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>Number of coefficients    : 15</pre>"
      ],
      "text/plain": [
       "<pre>Number of coefficients    : 15</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>Starting Newton Method</pre>"
      ],
      "text/plain": [
       "<pre>Starting Newton Method</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>--------------------------------------------------------</pre>"
      ],
      "text/plain": [
       "<pre>--------------------------------------------------------</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>+-----------+----------+--------------+-------------------+---------------------+</pre>"
      ],
      "text/plain": [
       "<pre>+-----------+----------+--------------+-------------------+---------------------+</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>| Iteration | Passes   | Elapsed Time | Training-accuracy | Validation-accuracy |</pre>"
      ],
      "text/plain": [
       "<pre>| Iteration | Passes   | Elapsed Time | Training-accuracy | Validation-accuracy |</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>+-----------+----------+--------------+-------------------+---------------------+</pre>"
      ],
      "text/plain": [
       "<pre>+-----------+----------+--------------+-------------------+---------------------+</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>| 1         | 2        | 11.411917    | 0.828745          | 0.828693            |</pre>"
      ],
      "text/plain": [
       "<pre>| 1         | 2        | 11.411917    | 0.828745          | 0.828693            |</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>| 2         | 3        | 18.462283    | 0.828745          | 0.828693            |</pre>"
      ],
      "text/plain": [
       "<pre>| 2         | 3        | 18.462283    | 0.828745          | 0.828693            |</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>| 3         | 4        | 25.570126    | 0.829122          | 0.829065            |</pre>"
      ],
      "text/plain": [
       "<pre>| 3         | 4        | 25.570126    | 0.829122          | 0.829065            |</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>| 4         | 5        | 32.306339    | 0.829122          | 0.829065            |</pre>"
      ],
      "text/plain": [
       "<pre>| 4         | 5        | 32.306339    | 0.829122          | 0.829065            |</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>| 5         | 6        | 38.881778    | 0.829122          | 0.829065            |</pre>"
      ],
      "text/plain": [
       "<pre>| 5         | 6        | 38.881778    | 0.829122          | 0.829065            |</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>| 6         | 7        | 46.276315    | 0.829122          | 0.829065            |</pre>"
      ],
      "text/plain": [
       "<pre>| 6         | 7        | 46.276315    | 0.829122          | 0.829065            |</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<pre>+-----------+----------+--------------+-------------------+---------------------+</pre>"
      ],
      "text/plain": [
       "<pre>+-----------+----------+--------------+-------------------+---------------------+</pre>"
      ]
     },
     "execution_count": 0,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = gl.logistic_classifier.create(train_data, target='click', features=['device_type', 'C1', 'C15', 'C16'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we didn't have to tell GraphLab how to handle each of the features, even though two features were numerical and the other two were categorical. The categorical features were automatically handled using \"dummy encoding\", which is why the output above indicates that there were 4 features but 13 model coefficients. (A simple explanation of dummy encoding is available in the [user guide](https://turi.com/learn/userguide/supervised-learning/linear-regression.html#linregr-categorical-features).)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Making Predictions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training a model, the final step is to use the model to make predictions. In other words, the model has learned a mathematical relationship between the features and the target, and it will use that relationship to predict the target value for new data points.\n",
    "\n",
    "In this case, we pass the testing data to the \"fitted model\", and ask it to output the predicted probability of a click:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype: float\nRows: 5\n[0.15783017896514664, 0.2097657129491405, 0.15783017896514664, 0.15783017896514664, 0.15783017896514664]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.predict(test_data, output_type='probability').head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point, you would want to evaluate the model by comparing the predicted probabilities versus the actual target values, using an appropriate \"evaluation metric.\" The best metric to use in this case is probably [logarithmic loss](https://www.kaggle.com/wiki/LogarithmicLoss), which is commonly used when you care about having well-calibrated probabilities. In addition, you might inspect the ROC curve and compute other metrics such as the F1-score and AUC. (See this [blog post](http://blog.turi.com/how-to-evaluate-machine-learning-models-part-2a-classification-metrics) for more details.)\n",
    "\n",
    "Now that we have these probabilities, we could find the ad that maximizes revenue by multiplying these probabilities by the cost-per-click, and finding the largest value.\n",
    "\n",
    "Although we're at the end of this notebook, this is really just the beginning! You can continue to add more features to the model, and then use the evaluation metric to compare the expected performance of each of your models. As well, you can use [feature engineering](https://turi.com/learn/gallery/notebooks/feature-engineering.html) to create new features, you can try other models, and so much more!\n",
    "\n",
    "If you'd like to read more about click-through rate prediction, there are readable papers on the topic by both [Criteo](http://people.csail.mit.edu/romer/papers/TISTRespPredAds.pdf) and [Google](http://static.googleusercontent.com/media/research.google.com/ru//pubs/archive/41159.pdf)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
